this post was submitted on 01 Jul 2023
15 points (85.7% liked)

Programming

17488 readers
138 users here now

Welcome to the main community in programming.dev! Feel free to post anything relating to programming here!

Cross posting is strongly encouraged in the instance. If you feel your post or another person's post makes sense in another community cross post into it.

Hope you enjoy the instance!

Rules

Rules

  • Follow the programming.dev instance rules
  • Keep content related to programming in some way
  • If you're posting long videos try to add in some form of tldr for those who don't want to watch videos

Wormhole

Follow the wormhole through a path of communities !webdev@programming.dev



founded 1 year ago
MODERATORS
 

Hi there!
So I was hoping programming might be able to help with this. I am trying to learn how to use Selenium for personal projects.

I have tried using normal, straight up selenium but it does not support authorized proxies...
I have tried SeleniumBase, but when I got it working, there were WebRTC leaks that I could not avoid.
I have tried using Undetected-chrome but that too I was unable to get authorized proxies working...
For proxies, I tried seleniumwire as well. That seems to use its own SSL certificates which unfortunately as far as I can tell is easy to detect.
Main purpose for me is web scraping and or simple login to X site, enter X info and close.

So my question is... What is the proper way to use Selenium for automating functions while avoiding detection?

Thank you in advance.

EDIT: I made this post because I am hitting a wall at every turn, and I feel like I might just be approaching this the wrong way, or maybe I'm just missing something crucial. I was hoping someone with more experience could explain to me what is the correct way of doing it without cobbling it together with duct tape.

top 17 comments
sorted by: hot top controversial new old
[–] starman@programming.dev 4 points 1 year ago (2 children)

Next time post python-related questions on programming.dev/c/python

[–] bugsmith@programming.dev 2 points 1 year ago (1 children)

c/learn_programming would also be a good fit for this.

If I understood correctly crossposting is somewhat frowned upon to avoid duplocate posts on lemmy (and I agree with that) so I will remember for next time.
Thanks!

[–] PracticalParrot@discuss.tchncs.de 1 points 1 year ago (1 children)

Sorry, thanks for sharing the link. Will do that for next time.

[–] starman@programming.dev 2 points 1 year ago

No problem, I hope you will find help there, when you need it

[–] iliketurtles@lemmy.world 4 points 1 year ago* (last edited 1 year ago) (1 children)

Not 100% sure if this is helpful, but here is what I'm importing and a snippet initiating webdriver. Sorry on mobile. The hardest part was getting the right chromium installed and getting the path right. I believe this is the one I used. sudo apt-get install chromium-chromedriver

from selenium.webdriver.chrome.options import Options
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.common.keys import Keys
from selenium.common.exceptions import NoSuchElementException
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC


driver = webdriver.Chrome("/usr/lib/chromium-browser/chromedriver")```
[–] PracticalParrot@discuss.tchncs.de 2 points 1 year ago (1 children)

Thanks! I appreciate your response, and love the username! :) Fortunately getting the right chrome driver working was one of the very few things that just worked for me. Have you gotten proxies working without proxy detection with this setup, and if so, what did you do?

[–] iliketurtles@lemmy.world 2 points 1 year ago

Oh, sorry haven't tried proxies. Just scraping a single site.

[–] jflorez@sh.itjust.works 4 points 1 year ago* (last edited 1 year ago) (1 children)

You can do it with plain selenium all you need to set the proxy in the browser options.

from selenium import webdriver

PROXY_WITH_PORT= "111.222.333.443:8080" chrome_options = webdriver.ChromeOptions() chrome_options.add_argument(f'--proxy-server={PROXY_WITH_PORT}')

chrome = webdriver.Chrome(chrome_options=chrome_options) chrome.get("http://google.com")

In general you can pass any command like argument to the browser using options. For chrome you can find all the proxy related options here: https://www.chromium.org/developers/design-documents/network-settings/

Also if you are using the latest selenium (4.10 I think) you don’t need to point to the chromedriver executable, selenium will automatically download it if it can’t find the driver

Edit: more information about chrome arguments Edit: info about driver download

[–] PracticalParrot@discuss.tchncs.de 1 points 1 year ago (2 children)

Thank you. The problem with this is that it does not support authorized proxies though. You cannot pass user:pass through it. My janky solution was to uae something like pproxy to relay the connection through my own peoxy server without user:pass. Has not been fully effective though.

I will look into doing this and using IP authorization instead of user:pass. Thank you for the help.

How is vanilla selenium with WebRTC or DNS leaks?

[–] jflorez@sh.itjust.works 2 points 1 year ago

You might need to install your own proxy on the selenium PC and then chain that proxy to the authenticated one. Then configure the driver to use the local unauthenticated proxy

[–] jflorez@sh.itjust.works 2 points 1 year ago

Also since Selenium just drives an actual browser the WebRTC and DNS leaks will be the browser’s responsibility not selenium. As long as you can locate elements on a page your will be ok

[–] ake@lemmy.dbzer0.com 1 points 1 year ago (1 children)

have you considered using something else? like playwright/puppeteer?

[–] PracticalParrot@discuss.tchncs.de 2 points 1 year ago (1 children)

I have considered it but since I only know Python I have yet to try it. Might be time to learn javascript :)

[–] ake@lemmy.dbzer0.com 6 points 1 year ago (2 children)

Wow I had no idea! Thanks for sharing I'll look into it.

After getting it working with proxies I realize they have the exact same webRTC issue. I assume it's for the same reason as Selenium having it. Unfortunate but there you have it.

load more comments
view more: next ›