Add Malicious Webpage Detection Example by edencfc · Pull Request #976 · PaddlePaddle/book

edencfc · 2021-05-13T09:52:59Z

Add Malicious Webpage Detection Example by PaddleNLP

TCChenlong

有些地方有问题，comments了，辛苦改下吧感谢~

TCChenlong · 2021-05-13T11:21:04Z

paddle2.0_docs/malicious_webpage_detection/malicious_webpage_detection.ipynb

+   "source": [
+    "# 使用LSTM的恶意网页识别\n",
+    "\n",
+    "**作者:** [PaddlePaddle](https://github.com/PaddlePaddle) <br>\n",


作者这里写自己的github名字和链接感谢大家的贡献~

TCChenlong · 2021-05-13T11:26:22Z

paddle2.0_docs/malicious_webpage_detection/malicious_webpage_detection.ipynb

+   "source": [
+    "## 三、网络搭建\n",
+    "\n",
+    "### 3.1 构造dataloder\n",


dataloder -> DataLoader

TCChenlong · 2021-05-13T11:27:33Z

paddle2.0_docs/malicious_webpage_detection/malicious_webpage_detection.ipynb

+    "import paddlenlp\n",
+    "import paddle.nn as nn\n",
+    "import paddle.nn.functional as F\n",
+    "import paddlenlp as ppnlp\n",


不推荐这么用，还是 paddlenlp 就好~

就是删掉72行？

TCChenlong

LGTM

chenxiaozeng · 2021-05-17T02:23:23Z

paddle2.0_docs/malicious_webpage_detection/malicious_webpage_detection.ipynb

+   },
+   "outputs": [],
+   "source": [
+    "!pip install lxml -i https://mirror.baidu.com/pypi/simple/\r\n",


lxml和html5lib若后面没用到，需删除

chenxiaozeng · 2021-05-17T02:26:10Z

paddle2.0_docs/malicious_webpage_detection/malicious_webpage_detection.ipynb

+   },
+   "outputs": [],
+   "source": [
+    "class SelfDefinedDataset(paddle.io.Dataset):\n",


PaddleNLP自定义数据集有多种方式，可参考：https://paddlenlp.readthedocs.io/zh/latest/data_prepare/dataset_self_defined.html
当然，这里的自定义也没问题～

TCChenlong · 2021-06-02T09:50:56Z

paddle2.0_docs/malicious_webpage_detection/malicious_webpage_detection.ipynb

+    "然后接一个线性变换层，完成二分类任务。\n",
+    "\n",
+    "- `paddle.nn.Embedding`组建word-embedding层\n",
+    "- `ppnlp.seq2vec.LSTMEncoder`组建句子建模层\n",


这里也需要改一下: ppnlp -> paddlenlp

TCChenlong · 2021-06-02T09:51:18Z

paddle2.0_docs/malicious_webpage_detection/malicious_webpage_detection.ipynb

+    "            padding_idx=padding_idx)\n",
+    "\n",
+    "        # 将word embedding经过LSTMEncoder变换到文本语义表征空间中\n",
+    "        self.lstm_encoder = ppnlp.seq2vec.LSTMEncoder(\n",


这里也需要改一下: ppnlp -> paddlenlp

TCChenlong · 2021-06-04T07:49:50Z

paddle2.0_docs/malicious_webpage_detection/malicious_webpage_detection.ipynb

+    "# 提取全部被黑页面样本\r\n",
+    "d_page = tempdf[tempdf['flag']=='d']\r\n",
+    "# 合并样本\r\n",
+    "train_page = pd.concat([n_page,d_page],axis=0)\r\n",


这里做了两次合并合并一次就可以吧？

Add Malicious Webpage Detection Example

ef20bca

Add Malicious Webpage Detection Example by PaddleNLP

TCChenlong reviewed May 13, 2021

View reviewed changes

make corrections based on comments

3e21f60

TCChenlong approved these changes May 14, 2021

View reviewed changes

chenxiaozeng reviewed May 17, 2021

View reviewed changes

删除lxml和html5lib的安装

09be98f

TCChenlong reviewed Jun 2, 2021

View reviewed changes

TCChenlong reviewed Jun 4, 2021

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Malicious Webpage Detection Example#976

Add Malicious Webpage Detection Example#976
edencfc wants to merge 3 commits intoPaddlePaddle:developfrom
edencfc:develop

edencfc commented May 13, 2021

Uh oh!

TCChenlong left a comment

Uh oh!

TCChenlong May 13, 2021

Uh oh!

TCChenlong May 13, 2021

Uh oh!

TCChenlong May 13, 2021

Uh oh!

edencfc May 13, 2021

Uh oh!

TCChenlong left a comment

Uh oh!

chenxiaozeng May 17, 2021

Uh oh!

chenxiaozeng May 17, 2021

Uh oh!

TCChenlong Jun 2, 2021

Uh oh!

TCChenlong Jun 2, 2021

Uh oh!

TCChenlong Jun 4, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

edencfc commented May 13, 2021

Uh oh!

TCChenlong left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

TCChenlong left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants