Set custom temporary directory used by FF and delete it#1093
Open
ndanner-wesleyancs wants to merge 18 commits intoopenwpm:masterfrom
Open
Set custom temporary directory used by FF and delete it#1093ndanner-wesleyancs wants to merge 18 commits intoopenwpm:masterfrom
ndanner-wesleyancs wants to merge 18 commits intoopenwpm:masterfrom
Conversation
instance and delete it when the browser quits. This is a workaround for an issue with geckodriver. When the OpenWPM extension is installed via `WebDriver.install_addon()`, geckodriver makes a copy of the XPI file in TMPDIR. However, geckodriver never deletes that file. So on a stateless crawl, you end up with one copy of the XPI file for each site visited. This workaround sets TMPDIR in the environment before creating the geckodriver service, and then deletes the directory after `driver.quit()` returns in `BrowserManager.run()`. We use this indirection because we don't have access to the name of the temporary file, and it doesn't seem safe to just delete XPI files in /tmp willy-nilly.
- Temporary directory is created in `BrowserManagerHandle.launch_browser_manager`. - Temporary directory is deleted in `BrowserManagerHandle.shutdown_browser`. - Temporary directory is added to environment for `selenium.webdriver.firefox.service.Service` in `deploy_browsers.deploy_firefox()
`BrowserManagerHandle.close_browser_manager`.
`BrowserParams.tmpdir` is now `Optional[Path]`. Value `None` means no temporary directory has been set, `Some p` means it has been set to `p`. When the browser manager is launched, if `tmpdir` is not `None`, try to delete it, assuming that it is leftover from some failure that prevented normal deletion. Regardless, then make a new temporary directory. Make sure to set it to `None` when the temporary directory is deleted during normal cleanup.
BrowserManagerHandle.close_browser_manager.
instance and delete it when the browser quits. This is a workaround for an issue with geckodriver. When the OpenWPM extension is installed via `WebDriver.install_addon()`, geckodriver makes a copy of the XPI file in TMPDIR. However, geckodriver never deletes that file. So on a stateless crawl, you end up with one copy of the XPI file for each site visited. This workaround sets TMPDIR in the environment before creating the geckodriver service, and then deletes the directory after `driver.quit()` returns in `BrowserManager.run()`. We use this indirection because we don't have access to the name of the temporary file, and it doesn't seem safe to just delete XPI files in /tmp willy-nilly.
- Temporary directory is created in `BrowserManagerHandle.launch_browser_manager`. - Temporary directory is deleted in `BrowserManagerHandle.shutdown_browser`. - Temporary directory is added to environment for `selenium.webdriver.firefox.service.Service` in `deploy_browsers.deploy_firefox()
`BrowserManagerHandle.close_browser_manager`.
`BrowserParams.tmpdir` is now `Optional[Path]`. Value `None` means no temporary directory has been set, `Some p` means it has been set to `p`. When the browser manager is launched, if `tmpdir` is not `None`, try to delete it, assuming that it is leftover from some failure that prevented normal deletion. Regardless, then make a new temporary directory. Make sure to set it to `None` when the temporary directory is deleted during normal cleanup.
BrowserManagerHandle.close_browser_manager.
vringar
requested changes
Jul 8, 2024
Contributor
vringar
left a comment
There was a problem hiding this comment.
Hey,
thanks for you contribution.
Could you create a test where you assert that the original TMPDIR doesn't contain any .xpi and also address the mypy issues?
Or is this contribution done from your side and if I want those things I need to do them myself?
Author
|
I'll try to work on these in the next couple of weeks. |
`webdriver.Firefox` in `deploy_browsers/deploy_firefox.py`.
… into delete_tmpdir
Author
|
I have fixed the |
Author
|
I'm a little perplexed about your suggested test. The original TMPDIR (typically |
… into delete_tmpdir
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This pull request modifies OpenWPM as follows:
geckodriveris started, the TMPDIR environment variable is set in the environment used bygeckodriver.The proximate rationale for doing this is that when
geckodriveradds the OpenWPM extension to FF, it makes a copy of the XPI file in its temporary directory. Unfortunately, it doesn't delete it. That means one copy of the XPI file for each browser that is launched is left behind in/tmp. On a big stateless crawl, that's a lot of copies. We can't find out the name of the temporary file thatgeckodrivercreates to delete it ourselves, and we don't want to just delete all XPI files in/tmpbecause we don't actually know to whom the all belong. This approach seems to resolve these problems.This patch is based on v.0.28.0.
See #1090.