Python之Pandas:pandas.read_csv()函数的简介、具体案例、使用方法详细攻略
Python之Pandas:pandas.read_csv()函數(shù)的簡介、具體案例、使用方法詳細(xì)攻略
?
?
?
?
目錄
read_csv()函數(shù)的簡介
?
?
read_csv()函數(shù)的簡介
? ? ? ? ? ? ? ?read_csv函數(shù),不僅可以讀取csv文件,同樣可以直接讀入txt文件(默認(rèn)讀取逗號間隔內(nèi)容的txt文件)。
pd.read_csv('data.csv') pandas.read_csv(filepath_or_buffer, sep=',', delimiter=None, header='infer', names=None, index_col=None, usecols=None, squeeze=False, prefix=None, mangle_dupe_cols=True, dtype=None, engine=None, converters=None, true_values=None, false_values=None, skipinitialspace=False, skiprows=None, skipfooter=0, nrows=None, na_values=None, keep_default_na=True, na_filter=True, verbose=False, skip_blank_lines=True, parse_dates=False, infer_datetime_format=False, keep_date_col=False, date_parser=None, dayfirst=False, cache_dates=True, iterator=False, chunksize=None, compression='infer', thousands=None, decimal='.', lineterminator=None, quotechar='"', quoting=0, doublequote=True, escapechar=None, comment=None, encoding=None, dialect=None, error_bad_lines=True, warn_bad_lines=True, delim_whitespace=False, low_memory=True, memory_map=False, float_precision=None)[source]?源代碼文檔:https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html?highlight=read_csv#pandas.read_csv
| pandas.read_csv(filepath_or_buffer, sep=',', delimiter=None, header='infer', names=None, index_col=None, usecols=None, squeeze=False, prefix=None, mangle_dupe_cols=True, dtype=None, engine=None, converters=None, true_values=None, false_values=None, skipinitialspace=False, skiprows=None, skipfooter=0, nrows=None, na_values=None, keep_default_na=True, na_filter=True, verbose=False, skip_blank_lines=True, parse_dates=False, infer_datetime_format=False, keep_date_col=False, date_parser=None, dayfirst=False, cache_dates=True, iterator=False, chunksize=None, compression='infer', thousands=None, decimal='.', lineterminator=None, quotechar='"', quoting=0, doublequote=True, escapechar=None, comment=None, encoding=None, dialect=None, error_bad_lines=True, warn_bad_lines=True, delim_whitespace=False, low_memory=True, memory_map=False, float_precision=None)[source] | ? |
| Read a comma-separated values (csv) file into DataFrame.Also supports optionally iterating or breaking of the file into chunks.Additional help can be found in the online docs for IO Tools. | 將逗號分隔值(csv)文件讀入DataFrame。還支持可選地迭代或?qū)⑽募纸獬蓧K。更多的幫助可以在IO工具的在線文檔中找到。 |
| Parameters filepath_or_buffer:? ?str, path object or file-like object Any valid string path is acceptable. The string could be a URL. Valid URL schemes include http, ftp, s3, gs, and file. For file URLs, a host is expected. A local file could be:?file://localhost/path/to/table.csv. If you want to pass in a path object, pandas accepts any?os.PathLike. By file-like object, we refer to objects with a?read()?method, such as a file handler (e.g. via builtin?open?function) or?StringIO. sep:? ?str, default ‘,’ Delimiter to use. If sep is None, the C engine cannot automatically detect the separator, but the Python parsing engine can, meaning the latter will be used and automatically detect the separator by Python’s builtin sniffer tool,?csv.Sniffer. In addition, separators longer than 1 character and different from?'\s+'?will be interpreted as regular expressions and will also force the use of the Python parsing engine. Note that regex delimiters are prone to ignoring quoted data. Regex example:?'\r\t'. delimiter:? ?str, default?None Alias for sep. header:? ?int, list of int, default ‘infer’ Row number(s) to use as the column names, and the start of the data. Default behavior is to infer the column names: if no names are passed the behavior is identical to?header=0?and column names are inferred from the first line of the file, if column names are passed explicitly then the behavior is identical to?header=None. Explicitly pass?header=0?to be able to replace existing names. The header can be a list of integers that specify row locations for a multi-index on the columns e.g. [0,1,3]. Intervening rows that are not specified will be skipped (e.g. 2 in this example is skipped). Note that this parameter ignores commented lines and empty lines if?skip_blank_lines=True, so?header=0?denotes the first line of data rather than the first line of the file. names:? ?array-like, optional List of column names to use. If the file contains a header row, then you should explicitly pass?header=0?to override the column names. Duplicates in this list are not allowed. index_col:? ?int, str, sequence of int / str, or False, default?None Column(s) to use as the row labels of the?DataFrame, either given as string name or column index. If a sequence of int / str is given, a MultiIndex is used. Note:?index_col=False?can be used to force pandas to?not?use the first column as the index, e.g. when you have a malformed file with delimiters at the end of each line. usecols:? ?list-like or callable, optional Return a subset of the columns. If list-like, all elements must either be positional (i.e. integer indices into the document columns) or strings that correspond to column names provided either by the user in?names?or inferred from the document header row(s). For example, a valid list-like?usecols?parameter would be?[0,?1,?2]?or?['foo',?'bar',?'baz']. Element order is ignored, so?usecols=[0,?1]?is the same as?[1,?0]. To instantiate a DataFrame from?data?with element order preserved use?pd.read_csv(data,?usecols=['foo',?'bar'])[['foo',?'bar']]?for columns in?['foo',?'bar']?order orpd.read_csv(data,?usecols=['foo',?'bar'])[['bar',?'foo']]?for?['bar',?'foo']?order. If callable, the callable function will be evaluated against the column names, returning names where the callable function evaluates to True. An example of a valid callable argument would be?lambda?x:?x.upper()?in?['AAA',?'BBB',?'DDD']. Using this parameter results in much faster parsing time and lower memory usage. squeeze:? ?bool, default False If the parsed data only contains one column then return a Series. prefix:? ?str, optional Prefix to add to column numbers when no header, e.g. ‘X’ for X0, X1, … mangle_dupe_cols:? ?bool, default True Duplicate columns will be specified as ‘X’, ‘X.1’, …’X.N’, rather than ‘X’…’X’. Passing in False will cause data to be overwritten if there are duplicate names in the columns. dtype:? ?Type name or dict of column -> type, optional Data type for data or columns. E.g. {‘a(chǎn)’: np.float64, ‘b’: np.int32, ‘c’: ‘Int64’} Use?str?or?objecttogether with suitable?na_values?settings to preserve and not interpret dtype. If converters are specified, they will be applied INSTEAD of dtype conversion. engine:? ?{‘c’, ‘python’}, optional Parser engine to use. The C engine is faster while the python engine is currently more feature-complete. converters:? ?dict, optional Dict of functions for converting values in certain columns. Keys can either be integers or column labels. true_values:? ?list, optional Values to consider as True. | 參數(shù) filepath_or_buffer: str,路徑對象或類文件對象 任何有效的字符串路徑都可以接受。字符串可以是URL。有效的URL方案包括http、ftp、s3、gs和file。對于文件url,需要一個主機。本地文件可以是:file://localhost/path/to/table.csv。 如果您想傳入一個path對象,pandas會接受任何類似os. path的東西。 通過類文件對象,我們使用read()方法引用對象,比如文件處理程序(例如通過內(nèi)置的open函數(shù))或StringIO。 sep:? ?str,默認(rèn)','。分隔符使用。如果sep為None,則C引擎無法自動檢測分隔符,但Python解析引擎可以,這意味著將使用Python內(nèi)置的嗅探工具css . sniffer自動檢測分隔符。此外,長度超過1個字符且與'\s+'不同的分隔符將被解釋為正則表達(dá)式,還將強制使用Python解析引擎。注意,正則表達(dá)式分隔符容易忽略引用的數(shù)據(jù)。正則表達(dá)式的例子:“\ r \ t”。 ? ? ? ?sep = '\t'? ? ?# 自定義分隔符。 delimiter:? ?str,默認(rèn)無,sep的別名。 header:? ?int, int列表,默認(rèn)' infer ' 用作列名的行號和數(shù)據(jù)的開頭。默認(rèn)行為是推斷列名:如果沒有傳遞名稱,行為與header=0相同,并且從文件的第一行推斷列名,如果列名是顯式傳遞的,那么行為與header=None相同。顯式傳遞header=0可以替換現(xiàn)有的名稱。標(biāo)頭可以是一個整數(shù)列表,為列(如[0,1,3])上的多索引指定行位置。未指定的中間行將被跳過(例如,本例中跳過了2)。注意,如果skip_blank_lines=True,此參數(shù)將忽略注釋行和空行,因此header=0表示數(shù)據(jù)的第一行,而不是文件的第一行。 names:? ??數(shù)組類,可選的 要使用的列名的列表。如果文件包含標(biāo)題行,那么應(yīng)該顯式傳遞header=0以覆蓋列名。此列表中不允許重復(fù)。 index_col:? ?int, str, int / str序列,或False,默認(rèn)無 作為DataFrame的行標(biāo)簽的列,以字符串名稱或列索引的形式給出。如果給定一個int / str序列,則使用一個多索引。 注意:index_col=False可以用來強制panda不使用第一列作為索引,例如,當(dāng)你有一個格式不正確的文件,每行末尾都有分隔符時。 usecols:? ?類似列表或可調(diào)用,可選 返回列的子集。如果類似列表,所有元素必須是位置的(即文檔列中的整數(shù)索引)或字符串,這些字符串對應(yīng)于用戶在名稱中提供的列名或從文檔頭行推斷出來的列名。例如,一個有效的類似列表的usecols參數(shù)應(yīng)該是[0,1,2]或['foo', 'bar', 'baz']。元素的順序被忽略,因此usecols=[0,1]與[1,0]相同。從數(shù)據(jù)和實例化一個DataFrame元素順序保存使用pd.read_csv(數(shù)據(jù),usecols =[“foo”、“酒吧”])[[“foo”、“酒吧”]]的列(“foo”、“酒吧”)秩序orpd.read_csv(數(shù)據(jù),usecols =[“foo”、“酒吧”])[[“酒吧”,“foo”]](“酒吧”,“foo”)的訂單。 如果可調(diào)用,可調(diào)用函數(shù)將根據(jù)列名計算,返回可調(diào)用函數(shù)計算值為True的名稱。一個有效的可調(diào)用參數(shù)的例子是['AAA', 'BBB', 'DDD']中的lambda x: x.upper()。使用此參數(shù)會導(dǎo)致更快的解析時間和更低的內(nèi)存使用。 squeeze:? ??bool,默認(rèn)為False 如果解析后的數(shù)據(jù)只包含一列,則返回一個序列。 前綴:str,可選的 沒有標(biāo)題時添加到列號的前綴,例如:' X '表示X0, X1,… mangle_dupe_cols:? bool,默認(rèn)True 重復(fù)列將被指定為' X ', ' X。1 ',…”X。是N,而不是X,是X。如果列中有重復(fù)的名稱,傳入False將導(dǎo)致數(shù)據(jù)被覆蓋。 dtype:? ??列->類型的類型名稱或dict,可選 數(shù)據(jù)或列的數(shù)據(jù)類型,可在讀入的時候指定數(shù)據(jù)類型。例如{a: np。使用str或objectwith適當(dāng)?shù)膎a_values設(shè)置來保存而不是解釋dtype。如果指定了轉(zhuǎn)換器,則將應(yīng)用它們而不是dtype轉(zhuǎn)換。 engine:? ??{' c ', ' python '},可選 要使用的解析器引擎。C引擎更快,而python引擎目前功能更完善。 converters:? ?dict類型,可選的 用于轉(zhuǎn)換某些列中的值的函數(shù)的字典。鍵可以是整數(shù)或列標(biāo)簽。 true_values:? ??列表,可選的 認(rèn)為是True的值。 |
| false_values:? ?list, optional Values to consider as False. skipinitialspace:? ?bool, default False Skip spaces after delimiter. skiprows:? ?list-like, int or callable, optional Line numbers to skip (0-indexed) or number of lines to skip (int) at the start of the file. If callable, the callable function will be evaluated against the row indices, returning True if the row should be skipped and False otherwise. An example of a valid callable argument would be?lambda?x:?x?in?[0,?2]. skipfooter:? ?int, default 0 Number of lines at bottom of file to skip (Unsupported with engine=’c’). nrows:? ?int, optional Number of rows of file to read. Useful for reading pieces of large files. na_values:? ?scalar, str, list-like, or dict, optional Additional strings to recognize as NA/NaN. If dict passed, specific per-column NA values. By default the following values are interpreted as NaN: ‘’, ‘#N/A’, ‘#N/A N/A’, ‘#NA’, ‘-1.#IND’, ‘-1.#QNAN’, ‘-NaN’, ‘-nan’, ‘1.#IND’, ‘1.#QNAN’, ‘<NA>’, ‘N/A’, ‘NA’, ‘NULL’, ‘NaN’, ‘n/a’, ‘nan’, ‘null’. keep_default_na:? ?bool, default True Whether or not to include the default NaN values when parsing the data. Depending on whether?na_values?is passed in, the behavior is as follows:
Note that if?na_filter?is passed in as False, the?keep_default_na?and?na_values?parameters will be ignored. na_filter:? ?bool, default True Detect missing value markers (empty strings and the value of na_values). In data without any NAs, passing na_filter=False can improve the performance of reading a large file. verbose:? ?bool, default False Indicate number of NA values placed in non-numeric columns. skip_blank_lines:? ?bool, default True If True, skip over blank lines rather than interpreting as NaN values. parse_dates:? ?bool or list of int or names or list of lists or dict, default False The behavior is as follows:
? ? ? ?If a column or index cannot be represented as an array of datetimes, say because of an unparseable value or a mixture of timezones, the column or index will be returned unaltered as an object data type. For non-standard datetime parsing, use?pd.to_datetimeafter?pd.read_csv. To parse an index or column with a mixture of timezones, specify?date_parser?to be a partially-applied?pandas.to_datetime()?with?utc=True. See?Parsing a CSV with mixed timezones?for more. Note: A fast-path exists for iso8601-formatted dates. infer_datetime_format:? ?bool, default False If True and?parse_dates?is enabled, pandas will attempt to infer the format of the datetime strings in the columns, and if it can be inferred, switch to a faster method of parsing them. In some cases this can increase the parsing speed by 5-10x. keep_date_col:? ?bool, default False If True and?parse_dates?specifies combining multiple columns then keep the original columns. date_parser:? ?function, optional Function to use for converting a sequence of string columns to an array of datetime instances. The default uses?dateutil.parser.parser?to do the conversion. Pandas will try to call?date_parser?in three different ways, advancing to the next if an exception occurs: 1) Pass one or more arrays (as defined by?parse_dates) as arguments; 2) concatenate (row-wise) the string values from the columns defined by?parse_dates?into a single array and pass that; and 3) call?date_parser?once for each row using one or more strings (corresponding to the columns defined by?parse_dates) as arguments. dayfirst:? ?bool, default False DD/MM format dates, international and European format. | false_values:? 列表,可選的 要考慮為假的值。 skipinitialspace:? ?bool,默認(rèn)為False 跳過分隔符后面的空格。 skiprows:? ?類列表,int或可調(diào)用,可選 文件開頭要跳過的行號(0索引)或要跳過的行數(shù)(int)。 如果可調(diào)用,callable函數(shù)將根據(jù)行索引計算,如果跳過該行,則返回True,否則返回False。一個有效的可調(diào)用參數(shù)的例子是[0,2]中的lambda x: x。 skipfooter:? int,默認(rèn)0 文件底部要跳過的行數(shù)(engine= ' c '不支持)。 nrows:? ??int,可選的 要讀取的文件行數(shù)。對讀取大文件很有用。測試樣本的時候nrows=2000,只選取前2000行數(shù)據(jù)!! na_values:? ??標(biāo)量、str、類列表或dict,可選。 需要識別為NA/NaN的附加字符串。如果dict通過,則指定每列的NA值。默認(rèn)情況下,以下值被解釋為NaN:? ?‘’, ‘#N/A’, ‘#N/A N/A’, ‘#NA’, ‘-1.#IND’, ‘-1.#QNAN’, ‘-NaN’, ‘-nan’, ‘1.#IND’, ‘1.#QNAN’, ‘<NA>’, ‘N/A’, ‘NA’, ‘NULL’, ‘NaN’, ‘n/a’, ‘nan’, ‘null’.? keep_default_na:? bool,默認(rèn)為真 解析數(shù)據(jù)時是否包含默認(rèn)的NaN值。根據(jù)是否傳入na_values,行為如下:
注意,如果將na_filter作為False傳入,則keep_default_na和na_values參數(shù)將被忽略。 na_filter:? ?bool,默認(rèn)為真 檢測缺失的值標(biāo)記(空字符串和na_values的值)。在沒有NAs的數(shù)據(jù)中,傳遞na_filter=False可以提高讀取大文件的性能。 verbose:? ?bool,默認(rèn)為False 指示放置在非數(shù)字列中的NA值的數(shù)目。 skip_blank_lines:? bool,默認(rèn)為真 如果為真,跳過空行,而不是解釋為NaN值。 parse_dates:? ??bool,或int列表,或列表,或dict字典的列表,默認(rèn)為False。 其行為如下:
? ? ? ? 如果不能將列或索引表示為日期時間數(shù)組,例如由于值不可解析或時區(qū)混合,則列或索引將作為對象數(shù)據(jù)類型不變地返回。對于非標(biāo)準(zhǔn)的日期時間解析,在pd.read_csv之后使用pd. to_datetime。要解析混合時區(qū)的索引或列,請將date_parser指定為部分應(yīng)用的pandas.to_datetime(),并使用 utc=True。有關(guān)更多信息,請參見Parsing a CSV with mixed timezones?。 注意:有一個用于iso8601格式的日期的快速路徑。 iinfer_datetime_format:? ? bool,默認(rèn)為False 如果為真且啟用了parse_date, pandas將嘗試推斷出列中datetime字符串的格式,如果可以推斷出來,則切換到更快的解析方法。在某些情況下,這可以提高解析速度5-10倍。 keep_date_col:? ??bool,默認(rèn)為False 如果為真,并且parse_date指定合并多列,則保留原始列。 date_parser:? 功能,可選的 函數(shù),用于將字符串列序列轉(zhuǎn)換為日期時間實例數(shù)組。默認(rèn)使用dateutil.parser。解析器執(zhí)行轉(zhuǎn)換。Pandas 將嘗試以三種不同的方式調(diào)用date_parser,如果出現(xiàn)異常,則繼續(xù)調(diào)用:1)傳遞一個或多個數(shù)組(由parse_date定義)作為參數(shù);2)將parse_date定義的列中的字符串值連接到一個數(shù)組中并傳遞它;使用一個或多個字符串(對應(yīng)于parse_date定義的列)作為參數(shù),對每一行調(diào)用date_parser一次。 dayfirst:? ?bool,默認(rèn)為False DD/MM格式日期,國際和歐洲格式。 |
| cache_dates:? ?bool, default True If True, use a cache of unique, converted dates to apply the datetime conversion. May produce significant speed-up when parsing duplicate date strings, especially ones with timezone offsets. New in version 0.25.0. iterator:? ?bool, default False Return TextFileReader object for iteration or getting chunks with?get_chunk(). chunksize:? ?int, optional Return TextFileReader object for iteration. See the?IO Tools docs?for more information on?iterator?and?chunksize. compression:? ?{‘infer’, ‘gzip’, ‘bz2’, ‘zip’, ‘xz’, None}, default ‘infer’ For on-the-fly decompression of on-disk data. If ‘infer’ and?filepath_or_buffer?is path-like, then detect compression from the following extensions: ‘.gz’, ‘.bz2’, ‘.zip’, or ‘.xz’ (otherwise no decompression). If using ‘zip’, the ZIP file must contain only one data file to be read in. Set to None for no decompression. thousands:? ?str, optional Thousands separator. decimal:? ?str, default ‘.’ Character to recognize as decimal point (e.g. use ‘,’ for European data). lineterminator:? ?str (length 1), optional Character to break file into lines. Only valid with C parser. quotechar:? ?str (length 1), optional The character used to denote the start and end of a quoted item. Quoted items can include the delimiter and it will be ignored. quoting:? ?int or csv.QUOTE_* instance, default 0 Control field quoting behavior per?csv.QUOTE_*?constants. Use one of QUOTE_MINIMAL (0), QUOTE_ALL (1), QUOTE_NONNUMERIC (2) or QUOTE_NONE (3). doublequote:? ?bool, default?True When quotechar is specified and quoting is not?QUOTE_NONE, indicate whether or not to interpret two consecutive quotechar elements INSIDE a field as a single?quotecharelement. escapechar:? ?str (length 1), optional One-character string used to escape other characters. comment:? ?str, optional Indicates remainder of line should not be parsed. If found at the beginning of a line, the line will be ignored altogether. This parameter must be a single character. Like empty lines (as long as?skip_blank_lines=True), fully commented lines are ignored by the parameter?header?but not by?skiprows. For example, if?comment='#', parsing?#empty\na,b,c\n1,2,3?with?header=0?will result in ‘a(chǎn),b,c’ being treated as the header. encoding:? ?str, optional Encoding to use for UTF when reading/writing (ex. ‘utf-8’).?List of Python standard encodings?. dialect:? ?str or csv.Dialect, optional If provided, this parameter will override values (default or not) for the following parameters:?delimiter,?doublequote,?escapechar,?skipinitialspace,?quotechar, and?quoting. If it is necessary to override values, a ParserWarning will be issued. See csv.Dialect documentation for more details. error_bad_lines:? ?bool, default True Lines with too many fields (e.g. a csv line with too many commas) will by default cause an exception to be raised, and no DataFrame will be returned. If False, then these “bad lines” will dropped from the DataFrame that is returned. warn_bad_lines:? ?bool, default True If error_bad_lines is False, and warn_bad_lines is True, a warning for each “bad line” will be output. delim_whitespace:? ?bool, default False Specifies whether or not whitespace (e.g.?'?'?or?'????') will be used as the sep. Equivalent to setting?sep='\s+'. If this option is set to True, nothing should be passed in for the?delimiter?parameter. low_memory:? ?bool, default True Internally process the file in chunks, resulting in lower memory use while parsing, but possibly mixed type inference. To ensure no mixed types either set False, or specify the type with the?dtype?parameter. Note that the entire file is read into a single DataFrame regardless, use the?chunksize?or?iterator?parameter to return the data in chunks. (Only valid with C parser). memory_map:? ?bool, default False If a filepath is provided for?filepath_or_buffer, map the file object directly onto memory and access the data directly from there. Using this option can improve performance because there is no longer any I/O overhead. float_precision:??str, optional Specifies which converter the C engine should use for floating-point values. The options are?None?for the ordinary converter,?high?for the high-precision converter, and?round_tripfor the round-trip converter. | cache_dates:? ?bool,默認(rèn)為真 如果為真,則使用唯一已轉(zhuǎn)換日期的緩存來應(yīng)用日期時間轉(zhuǎn)換??赡茉诮馕鲋貜?fù)的日期字符串時產(chǎn)生顯著的加速,特別是那些具有時區(qū)偏移量的日期字符串。 新版本0.25.0。 iterator:? ??bool,默認(rèn)為False 返回TextFileReader對象用于迭代或使用get_chunk()獲取塊。 chunksize:? ? int,可選的 返回TextFileReader對象進(jìn)行迭代。有關(guān)iterator和chunksize的更多信息,請參閱IO工具文檔。 compression:? ?{“推斷”,gzip, bz2的獲取,“郵政編碼”,“xz”,沒有},默認(rèn)“推斷” 用于對磁盤上的數(shù)據(jù)進(jìn)行動態(tài)解壓縮。如果' infer '和filepath_or_buffer是類路徑的,那么從以下擴展名檢測壓縮:'。廣州“,”。bz2”、“獲取。郵政“,或”。(否則不解壓)。如果使用' zip ', zip文件必須只包含一個數(shù)據(jù)文件來讀入。設(shè)置為None表示不進(jìn)行解壓縮。 thousands:?? ?str,可選的 成千上萬的分隔符。 decimal:? ??str,默認(rèn)為'。' 可識別為小數(shù)點的字符(例如,對于歐洲數(shù)據(jù)使用',')。 lineterminator:? str(長度1),可選 用于將文件分成幾行的字符。僅對C解析器有效。 quotechar:? ?str(長度1),可選 用于表示引用項的開始和結(jié)束的字符。引用的項可以包含分隔符,它將被忽略。 quoting:? ?nt或csv。QUOTE_* instance,默認(rèn)為0 控制字段引用行為每個css . quote_ *常量。使用QUOTE_MINIMAL(0)、QUOTE_ALL(1)、QUOTE_NONNUMERIC(2)或QUOTE_NONE(3)中的一個。 doublequote:? ?bool,默認(rèn)為True 當(dāng)quotechar被指定并且引號不是QUOTE_NONE時,指示是否將字段內(nèi)的兩個連續(xù)的quotechar元素解釋為單個quotecharelement。 escapechar:? ?str(長度1),可選 用于轉(zhuǎn)義其他字符的單字符字符串。 comment:? ???str,可選的 指示不應(yīng)解析行的余數(shù)。如果在一行的開頭找到,這一行將被完全忽略。此參數(shù)必須為單個字符。與空行(只要skip_blank_lines=True)一樣,完全注釋的行會被參數(shù)頭忽略,但不會被skiprows忽略。例如,如果注釋='#',用header=0解析#empty\na,b,c\n1,2,3將導(dǎo)致將' a,b,c '作為header處理。 encoding:? ? ?str,可選的。 讀取/寫入UTF時使用的編碼(例如' UTF -8 ')。Python標(biāo)準(zhǔn)編碼列表。如果讀取的csv文件中,輸出的時候遇到中文亂碼,則需要加 encoding='utf-8' dialect:? ??str或csv。Dialect,可選 如果提供,該參數(shù)將覆蓋以下參數(shù)的值(默認(rèn)值或非默認(rèn)值):delimiter、doublequote、escapechar、skipinitialspace、quotechar和quotes。如果有必要重寫值,則會發(fā)出ParserWarning??吹絚sv。方言文檔了解更多細(xì)節(jié)。 error_bad_lines:? ?? bool,默認(rèn)為真。 字段太多的行(例如,csv行有太多逗號)默認(rèn)情況下會引發(fā)異常,并且不會返回DataFrame。如果為False,那么這些“壞行”將從返回的DataFrame中刪除。 warn_bad_lines:? ? bool,默認(rèn)為True。如果error_bad_lines為False,而warn_bad_lines為True,則將為每個“壞行”輸出一個警告。 delim_whitespace:? ?bool,默認(rèn)為False 指定是否使用空白(例如' '或' ')作為sep.等價于設(shè)置sep='\s+'。如果將此選項設(shè)置為True,則不應(yīng)該為分隔符參數(shù)傳遞任何內(nèi)容。 low_memory:? bool,默認(rèn)為True 在內(nèi)部以塊的形式處理文件,導(dǎo)致在解析時使用更低的內(nèi)存,但可能是混合類型推斷。確保沒有混合類型設(shè)置為False,或使用dtype參數(shù)指定類型。注意,整個文件被讀入一個單獨的DataFrame中,使用chunksize或iterator參數(shù)以塊的形式返回數(shù)據(jù)。(僅對C解析器有效)。 memory_map:? ?bool,默認(rèn)為False 如果為filepath_or_buffer提供了一個filepath,則將該文件對象直接映射到內(nèi)存并從那里直接訪問數(shù)據(jù)。使用此選項可以提高性能,因為不再有任何I/O開銷。 float_precision:???str,可選 指定C引擎應(yīng)該為浮點值使用哪個轉(zhuǎn)換器。普通轉(zhuǎn)換器為None,高精度轉(zhuǎn)換器為high,往返轉(zhuǎn)換器為round_trip。 |
| Returns DataFrame or TextParser A comma-separated values (csv) file is returned as two-dimensional data structure with labeled axes. | DataFrame或TextParser 以逗號分隔的值(csv)文件被返回為帶有標(biāo)記軸的二維數(shù)據(jù)結(jié)構(gòu)。 |
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
總結(jié)
以上是生活随笔為你收集整理的Python之Pandas:pandas.read_csv()函数的简介、具体案例、使用方法详细攻略的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 成功解决OpenVideoCall(不可
- 下一篇: 成功解决AttributeError :