| ? ?Parameters ? ? ---------- ? ? estimator : estimator object. This is assumed to implement the scikit-learn estimator interface. Either estimator needs to provide a ``score`` function, or ``scoring`` must be passed. ? ?? ? ? param_grid : dict or list of dictionaries. Dictionary with parameters names (`str`) as keys and lists of parameter settings to try as values, or a list of such dictionaries, in which case the grids spanned by each dictionary in the list are explored. This enables searching over any sequence of parameter settings. ? ?? ? ? scoring : str, callable, list/tuple or dict, default=None. A single str (see :ref:`scoring_parameter`) or a callable (see :ref:`scoring`) to evaluate the predictions on the test set.? ? ? ?For evaluating multiple metrics, either give a list of (unique) strings or a dict with names as keys and callables as values. ? ? NOTE that when using custom scorers, each scorer should return a ?single value. Metric functions returning a list/array of values can be wrapped into multiple scorers that return one value each. ? ? See :ref:`multimetric_grid_search` for an example. ? ? If None, the estimator's score method is used. ? ?? ? ? n_jobs : int, default=None. ?Number of jobs to run in parallel. ?``None`` means 1 unless in a :obj:`joblib.parallel_backend` context. ?``-1`` means using all processors. See :term:`Glossary <n_jobs>` for more details. ? ? .. versionchanged:: v0.20. ?`n_jobs` default changed from 1 to None ? ?? ? ? pre_dispatch : int, or str, default=n_jobs. Controls the number of jobs that get dispatched during parallel ?execution. Reducing this number can be useful to avoid an explosion of memory consumption when more jobs get dispatched than CPUs can process. This parameter can be: ? ? - None, in which case all the jobs are immediately ?created and spawned. Use this for lightweight and fast-running jobs, to avoid delays due to on-demand spawning of the jobs ? ? - An int, giving the exact number of total jobs that are spawned ? ? - A str, giving an expression as a function of n_jobs, as in '2*n_jobs' ? ?? ? ? iid : bool, default=False. ?If True, return the average score across folds, weighted by the number ?of samples in each test set. In this case, the data is assumed to be identically distributed across the folds, and the loss minimized is ?the total loss per sample, and not the mean loss across the folds. ? ? .. deprecated:: 0.22. Parameter ``iid`` is deprecated in 0.22 and will be removed in 0.24 ? ?? ? ? cv : int, cross-validation generator or an iterable, default=None. Determines the cross-validation splitting strategy. ?Possible inputs for cv are: ? ? - None, to use the default 5-fold cross validation, ? ? - integer, to specify the number of folds in a `(Stratified)KFold`, ? ? - :term:`CV splitter`, ? ? - An iterable yielding (train, test) splits as arrays of indices. ? ? For integer/None inputs, if the estimator is a classifier and ``y`` is either binary or multiclass, :class:`StratifiedKFold` is used. In all other cases, :class:`KFold` is used. ? ? Refer :ref:`User Guide <cross_validation>` for the various cross-validation strategies that can be used here. ? ? .. versionchanged:: 0.22. ``cv`` default value if None changed from 3-fold to 5-fold. ? ?? ? ? refit : bool, str, or callable, default=True. Refit an estimator using the best found parameters on the whole dataset. For multiple metric evaluation, this needs to be a `str` denoting the scorer that would be used to find the best parameters for refitting the estimator at the end. ? ? ? ?Where there are considerations other than maximum score in ? ? choosing a best estimator, ``refit`` can be set to a function which ? ? returns the selected ``best_index_`` given ``cv_results_``. In that ? ? case, the ``best_estimator_`` and ``best_params_`` will be set ? ? according to the returned ``best_index_`` while the ``best_score_`` ? ? attribute will not be available. ? ? The refitted estimator is made available at the ``best_estimator_`` attribute and permits using ``predict`` directly on this ``GridSearchCV`` instance. ? ? Also for multiple metric evaluation, the attributes ``best_index_``, ``best_score_`` and ``best_params_`` will only be available if ``refit`` is set and all of them will be determined w.r.t this specific scorer. ? ? See ``scoring`` parameter to know more about multiple metric evaluation. ?.. versionchanged:: 0.20. Support for callable added. ? ?? ? ? verbose : integer. Controls the verbosity: the higher, the more messages. ? ?? ? ? error_score : 'raise' or numeric, default=np.nan. Value to assign to the score if an error occurs in estimator fitting. If set to 'raise', the error is raised. If a numeric value is given, FitFailedWarning is raised. This parameter does not affect the refit ?step, which will always raise the error. ? ?? ? ? return_train_score : bool, default=False. ?If ``False``, the ``cv_results_`` attribute will not include training scores. Computing training scores is used to get insights on how different parameter settings impact the overfitting/underfitting trade-off. However computing the scores on the training set can be computationally expensive and is not strictly required to select the parameters that ?yield the best generalization performance. ? ? .. versionadded:: 0.19 ? ? .. versionchanged:: 0.21. Default value was changed from ``True`` to ``False`` | 參數(shù) ---------- estimator :估計器對象。假定這樣做是為了實現(xiàn)scikit-learn估計器接口。估算器需要提供一個“得分”功能,或者必須傳遞“得分”。 ? param_grid :字典或字典列表。使用參數(shù)名稱(`str`)作為鍵的字典,以及將嘗試用作值的參數(shù)設(shè)置列表,或此類字典的列表,在這種情況下,將探索列表中每個字典所跨越的網(wǎng)格。這樣可以搜索任何順序的參數(shù)設(shè)置。 ? scoring :str,可調(diào)用,列表/元組或字典,默認(rèn)=無。單個str(請參閱scoring_parameter)或可調(diào)用項(請參閱scoring)來評估測試集上的預(yù)測。 ? ? ? 要評估多個指標(biāo),請給出(唯一的)字符串列表或以名稱為鍵,將可調(diào)用項為值的字典。 ? ? ? ?請注意,使用自定義計分器時,每個計分器應(yīng)返回一個單個值。返回值列表/數(shù)組的度量函數(shù)可以包裝到多個計分器中,每個計分器都返回一個值。 有關(guān)示例,請參見multimetric_grid_search。 ? ? ? ? 如果為None,則使用估算器的計分方法。 ? n_jobs :int,默認(rèn)=無。要并行運行的作業(yè)數(shù)。除非在:obj:`joblib.parallel_backend`上下文中,否則“ None``表示1。 -1表示使用所有處理器。有關(guān)更多詳細(xì)信息,請參見術(shù)語<n_jobs>`。 ..版本已更改:: v0.20。 `n_jobs`默認(rèn)從1更改為None ? pre_dispatch 或str,默認(rèn)= n_jobs。控制在并行執(zhí)行期間分派的作業(yè)數(shù)量。當(dāng)調(diào)度的作業(yè)數(shù)量超過CPU的處理能力時,減少此數(shù)量可能有助于避免內(nèi)存消耗激增。該參數(shù)可以是: -None,在這種情況下,將立即創(chuàng)建并產(chǎn)生所有作業(yè)。使用它進行輕量級和快速運行的作業(yè),以避免因按需生成作業(yè)而造成延遲 -一個int,給出產(chǎn)生的確切總工作數(shù) -一個str,根據(jù)n_jobs給出表達式,如'2 * n_jobs' ? iid :bool,默認(rèn)= False。如果為True,則按倍數(shù)返回平均得分,并按每個測試集中的樣本數(shù)加權(quán)。在這種情況下,假設(shè)數(shù)據(jù)在折痕上分布相同,并且最小化的損失是每個樣品的總損失,而不是折痕的平均損失。 ..不建議使用:: 0.22。參數(shù)“ iid”在0.22中已棄用,在0.24中將被刪除 ? cv :int,交叉驗證生成器或可迭代的default = None。確定交叉驗證拆分策略。簡歷的可能輸入是: -None,要使用默認(rèn)的5-fold交叉驗證, -integer整數(shù),用于指定“(分層)KFold”中的折疊次數(shù), -:CV splitter`, -可迭代的yielding (訓(xùn)練,測試)拆分為索引數(shù)組。 ? ? ? 對于整數(shù)/無輸入,如果估計器是分類器,而y是二進制或多類,則使用:StratifiedKFold。在所有其他情況下,都使用KFold類。 請參閱:ref:?用戶指南<cross_validation>`,了解可以在此處使用的各種交叉驗證策略。 ..版本已更改:: 0.22。如果無從3倍更改為5倍,則為cv默認(rèn)值。 ? refit :bool,str或callable,默認(rèn)為True。使用在整個數(shù)據(jù)集中找到的最佳參數(shù)重新擬合估算器。對于多指標(biāo)評估,這需要是一個“ str”,表示計分器,該計分器將被用于尋找最佳參數(shù),以最終擬合估計器。 在選擇最佳估算器時,除了最大分?jǐn)?shù)以外,還可以將``refit''設(shè)置為一個函數(shù),該函數(shù)在給定``cv_results_''的情況下返回所選的``best_index_''。在這種情況下,將根據(jù)返回的``best_index_''設(shè)置``best_estimator_''和``best_params_'',而``best_score_''屬性將不可用。 可以在“ best_estimator_”屬性中使用經(jīng)過重新調(diào)整的估計器,并允許在此“ GridSearchCV”實例上直接使用“預(yù)測”。 同樣對于多指標(biāo)評估,屬性``best_index _'',``best_score_''和``best_params_''僅在設(shè)置了``refit''后才可用,并且將通過該特定計分器確定所有屬性。 請參閱``評分''參數(shù)以了解有關(guān)多指標(biāo)評估的更多信息。 ..版本已更改:: 0.20。支持添加可調(diào)用。 ? verbose :整數(shù)。控制詳細(xì)程度:越高,消息越多。 ? error_score :“raise”或數(shù)字,默認(rèn)值= np.nan。如果估算器擬合出現(xiàn)錯誤,則分配給分?jǐn)?shù)的值。如果設(shè)置為“ raise”,則會引發(fā)錯誤。如果給出數(shù)值,則引發(fā)FitFailedWarning。此參數(shù)不會影響重新安裝步驟,這將始終引發(fā)錯誤。 ? return_train_score :布爾值,默認(rèn)為False。 如果為False,則cv_results_屬性將不包括訓(xùn)練得分。 計算培訓(xùn)分?jǐn)?shù)用于了解不同的參數(shù)設(shè)置如何影響過擬合/欠擬合權(quán)衡。 但是,在訓(xùn)練集上計算分?jǐn)?shù)可能會在計算上昂貴,并且并非嚴(yán)格要求選擇產(chǎn)生最佳泛化性能的參數(shù)。 ..版本添加:: 0.19 ..版本已更改:: 0.21。 默認(rèn)值從``True''更改為``False'' |
| ? Attributes ? ? ---------- ? ? cv_results_ : dict of numpy (masked) ndarrays.A dict with keys as column headers and values as columns, that can be imported into a pandas ``DataFrame``. ? ?? ? ? For instance the below given table ? ? +------------+-----------+------------+-----------------+---+---------+ ? ? |param_kernel|param_gamma|param_degree|split0_test_score|... ? ? ?|rank_t...| ? ? ?+============+===========+============+======== ? ? ?=========+===+=========+ ? ? | ?'poly' ? ?| ? ? -- ? ?| ? ? ?2 ? ? | ? ? ? 0.80 ? ? ?|...| ? ?2 ? ?| ? ? +------------+-----------+------------+-----------------+---+---------+ ? ? | ?'poly' ? ?| ? ? -- ? ?| ? ? ?3 ? ? | ? ? ? 0.70 ? ? ?|...| ? ?4 ? ?| ? ? +------------+-----------+------------+-----------------+---+---------+ ? ? | ?'rbf' ? ? | ? ? 0.1 ? | ? ? -- ? ? | ? ? ? 0.80 ? ? ?|...| ? ?3 ? ?| ? ? +------------+-----------+------------+-----------------+---+---------+ ? ? | ?'rbf' ? ? | ? ? 0.2 ? | ? ? -- ? ? | ? ? ? 0.93 ? ? ?|...| ? ?1 ? ?| ? ? +------------+-----------+------------+-----------------+---+---------+ ? ??? ? will be represented by a ``cv_results_`` dict of::? ? ? { ? ? 'param_kernel': masked_array(data = ['poly', 'poly', 'rbf', 'rbf'], ? ? mask = [False False False False]...) ? ? 'param_gamma': masked_array(data = [-- -- 0.1 0.2], ? ? mask = [ True ?True False False]...), ? ? 'param_degree': masked_array(data = [2.0 3.0 -- --], ? ? mask = [False False ?True ?True]...), ? ? 'split0_test_score' ?: [0.80, 0.70, 0.80, 0.93], ? ? 'split1_test_score' ?: [0.82, 0.50, 0.70, 0.78], ? ? 'mean_test_score' ? ?: [0.81, 0.60, 0.75, 0.85], ? ? 'std_test_score' ? ? : [0.01, 0.10, 0.05, 0.08], ? ? 'rank_test_score' ? ?: [2, 4, 3, 1], ? ? 'split0_train_score' : [0.80, 0.92, 0.70, 0.93], ? ? 'split1_train_score' : [0.82, 0.55, 0.70, 0.87], ? ? 'mean_train_score' ? : [0.81, 0.74, 0.70, 0.90], ? ? 'std_train_score' ? ?: [0.01, 0.19, 0.00, 0.03], ? ? 'mean_fit_time' ? ? ?: [0.73, 0.63, 0.43, 0.49], ? ? 'std_fit_time' ? ? ? : [0.01, 0.02, 0.01, 0.01], ? ? 'mean_score_time' ? ?: [0.01, 0.06, 0.04, 0.04], ? ? 'std_score_time' ? ? : [0.00, 0.00, 0.00, 0.01], ? ? 'params' ? ? ? ? ? ? : [{'kernel': 'poly', 'degree': 2}, ...], ? ? } ? ?? ? ? NOTE ? ?? ? ? The key ``'params'`` is used to store a list of parameter settings dicts for all the parameter candidates. ? ? The ``mean_fit_time``, ``std_fit_time``, ``mean_score_time`` and??``std_score_time`` are all in seconds. ? ? For multi-metric evaluation, the scores for all the scorers are available in the ``cv_results_`` dict at the keys ending with that scorer's name (``'_<scorer_name>'``) instead of ``'_score'`` shown above. ('split0_test_precision', 'mean_train_precision' etc.) ? ?? ? ? best_estimator_ : estimator. Estimator that was chosen by the search, i.e. estimator??which gave highest score (or smallest loss if specified) on the left out data. Not available if ``refit=False``. ? ? See ``refit`` parameter for more information on allowed values. ? ?? ? ? best_score_ : float. Mean cross-validated score of the best_estimator. For multi-metric evaluation, this is present only if ``refit`` is specified. This attribute is not available if ``refit`` is a function. ? ?? ? ? best_params_ : dict. Parameter setting that gave the best results on the hold out data. For multi-metric evaluation, this is present only if ``refit`` is specified. ? ?? ? ? best_index_ : int. The index (of the ``cv_results_`` arrays) which corresponds to the best candidate parameter setting.?The dict at ``search.cv_results_['params'][search.best_index_]`` gives the parameter setting for the best model, that gives the highest mean score (``search.best_score_``). ? ? For multi-metric evaluation, this is present only if ``refit`` is specified. ? ?? ? ? scorer_ : function or a dict.??Scorer function used on the held out data to choose the best parameters for the model. For multi-metric evaluation, this attribute holds the validated ``scoring`` dict which maps the scorer key to the scorer callable. ? ?? ? ? n_splits_ : int. The number of cross-validation splits (folds/iterations). ? ?? ? ? refit_time_ : float. Seconds used for refitting the best model on the whole dataset. This is present only if ``refit`` is not False. ? ? ? ?.. versionadded:: 0.20 ? ?? ? ? Notes ? ? ----- ? ? The parameters selected are those that maximize the score of the left? out?data, unless an explicit score is passed in which case it is used instead. ? ? If `n_jobs` was set to a value higher than one, the data is copied for? each point in the grid (and not `n_jobs` times). This is done for efficiency reasons if individual jobs take very little time, but may raise errors if the dataset is large and not enough memory is available. ?A? workaround in this case is to set `pre_dispatch`. Then, the memory is copied only??`pre_dispatch` many times. A reasonable value for `pre_dispatch` is `2 * n_jobs`. ? ?? ? ? See Also ? ? --------- ? ? :class:`ParameterGrid`: ? ? generates all the combinations of a hyperparameter grid. ? ?? ? ? :func:`sklearn.model_selection.train_test_split`: ? ? utility function to split the data into a development set usable for fitting a GridSearchCV instance and an evaluation set for??its final evaluation. ? ?? ? ? :func:`sklearn.metrics.make_scorer`: ? ? Make a scorer from a performance metric or loss function. ? ?? ? ? """ | 屬性 ---------- cv_results_:numpy(masked)ndarrays的字典。字典可以將鍵作為列標(biāo)題,將值作為列,可以將其導(dǎo)入到pandas ``DataFrame''中。 例如下面的表格 ? ? +------------+-----------+------------+-----------------+---+---------+ ? ? |param_kernel|param_gamma|param_degree|split0_test_score|... ? ? ?|rank_t...| ? ? ?+============+===========+============+======== ? ? ?=========+===+=========+ ? ? | ?'poly' ? ?| ? ? -- ? ?| ? ? ?2 ? ? | ? ? ? 0.80 ? ? ?|...| ? ?2 ? ?| ? ? +------------+-----------+------------+-----------------+---+---------+ ? ? | ?'poly' ? ?| ? ? -- ? ?| ? ? ?3 ? ? | ? ? ? 0.70 ? ? ?|...| ? ?4 ? ?| ? ? +------------+-----------+------------+-----------------+---+---------+ ? ? | ?'rbf' ? ? | ? ? 0.1 ? | ? ? -- ? ? | ? ? ? 0.80 ? ? ?|...| ? ?3 ? ?| ? ? +------------+-----------+------------+-----------------+---+---------+ ? ? | ?'rbf' ? ? | ? ? 0.2 ? | ? ? -- ? ? | ? ? ? 0.93 ? ? ?|...| ? ?1 ? ?| ? ? +------------+-----------+------------+-----------------+---+---------+ ? 將由以下內(nèi)容的“ cv_results_”字典表示:{ ? ? 'param_kernel': masked_array(data = ['poly', 'poly', 'rbf', 'rbf'], ? ? mask = [False False False False]...) ? ? 'param_gamma': masked_array(data = [-- -- 0.1 0.2], ? ? mask = [ True ?True False False]...), ? ? 'param_degree': masked_array(data = [2.0 3.0 -- --], ? ? mask = [False False ?True ?True]...), ? ? 'split0_test_score' ?: [0.80, 0.70, 0.80, 0.93], ? ? 'split1_test_score' ?: [0.82, 0.50, 0.70, 0.78], ? ? 'mean_test_score' ? ?: [0.81, 0.60, 0.75, 0.85], ? ? 'std_test_score' ? ? : [0.01, 0.10, 0.05, 0.08], ? ? 'rank_test_score' ? ?: [2, 4, 3, 1], ? ? 'split0_train_score' : [0.80, 0.92, 0.70, 0.93], ? ? 'split1_train_score' : [0.82, 0.55, 0.70, 0.87], ? ? 'mean_train_score' ? : [0.81, 0.74, 0.70, 0.90], ? ? 'std_train_score' ? ?: [0.01, 0.19, 0.00, 0.03], ? ? 'mean_fit_time' ? ? ?: [0.73, 0.63, 0.43, 0.49], ? ? 'std_fit_time' ? ? ? : [0.01, 0.02, 0.01, 0.01], ? ? 'mean_score_time' ? ?: [0.01, 0.06, 0.04, 0.04], ? ? 'std_score_time' ? ? : [0.00, 0.00, 0.00, 0.01], ? ? 'params' ? ? ? ? ? ? : [{'kernel': 'poly', 'degree': 2}, ...], ? ? } ? 注意 鍵``params''用于存儲所有候選參數(shù)的參數(shù)設(shè)置字典列表。 ``mean_fit_time'',``std_fit_time'',``mean_score_time''和``std_score_time''都以秒為單位。 對于多指標(biāo)評估,所有得分者的得分都可以在“ cv_results_” dict中以該得分者的名字(“ _ <scorer_name>””)而不是“ _score”的鍵獲得。如上所示。 (“ split0_test_precision”,“ mean_train_precision”等) ? best_estimator_:估算器。搜索選擇的估算器,即在剩余數(shù)據(jù)上給出最高分(或最小損失,如果指定)的估算器。如果``refit = False'',則不可用。 有關(guān)允許值的更多信息,請參見“改裝”參數(shù)。 ? best_score_:浮動。 best_estimator的平均交叉驗證得分。對于多指標(biāo)評估,僅在指定``refit''時才存在。如果``refit''是一個函數(shù),則此屬性不可用。 ? best_params_:字典。參數(shù)設(shè)置可使保留數(shù)據(jù)獲得最佳結(jié)果。對于多指標(biāo)評估,僅在指定``refit''時才存在。 ? best_index_:整數(shù)。與“ cv_results_”數(shù)組的索引相對應(yīng)的最佳候選參數(shù)設(shè)置。 search.cv_results _ ['params'] [search.best_index_]上的字典給出了最佳模型的參數(shù)設(shè)置,該模型給出了最高的平均得分(“ search.best_score_”)。 對于多指標(biāo)評估,僅在指定``refit''時才存在。 ? scorer_:函數(shù)或字典。對保留的數(shù)據(jù)使用記分器功能,以為模型選擇最佳參數(shù)。對于多指標(biāo)評估,此屬性保存已驗證的“評分”字典,該評分將記分員鍵映射到可調(diào)用的記分員。 ? n_splits_:整數(shù)。交叉驗證拆分(折疊/迭代)的數(shù)量。 ? refit_time_:浮動。用于在整個數(shù)據(jù)集中重新擬合最佳模型的秒數(shù)。僅當(dāng)``refit''不為False時才存在。 ..版本添加:: 0.20 ? 注意 ----- 所選擇的參數(shù)是那些使遺留數(shù)據(jù)的分?jǐn)?shù)最大化的參數(shù),除非傳遞了顯式分?jǐn)?shù),否則將使用該顯式分?jǐn)?shù)。 如果將n_jobs的值設(shè)置為大于1的值,則會為網(wǎng)格中的每個點復(fù)制數(shù)據(jù)(而不是n_jobs次)。如果出于效率考慮,這樣做是因為單個作業(yè)花費的時間很少,但是如果數(shù)據(jù)集很大且沒有足夠的可用內(nèi)存,則可能會引發(fā)錯誤。這種情況下的解決方法是設(shè)置`pre_dispatch`。然后,該內(nèi)存僅被復(fù)制一次pre_dispatch多次。 pre_dispatch的合理值是2 * n_jobs。 ? 也可以看看 --------- ParameterGrid 生成超參數(shù)網(wǎng)格的所有組合。 :func:`sklearn.model_selection.train_test_split`: 實用程序功能將數(shù)據(jù)分為可用于擬合GridSearchCV實例的開發(fā)集和用于其最終評估的評估集。 :func:`sklearn.metrics.make_scorer`: 根據(jù)績效指標(biāo)或損失函數(shù)確定得分手。 “” |