Python: support pathlib#5681
Conversation
6cc8a5b to
52a9040
Compare
This because `resolve` accesses the file system, I am open to not include that fact in the modelling.
that were only listed as file sytem accesses.
| // Special handling of the `/` operator | ||
| exists(BinaryExprNode slash, DataFlow::Node left | | ||
| slash.getOp() instanceof Div and | ||
| left.asCfgNode() = slash.getLeft() and | ||
| left.getALocalSource() = pathlibPath() | ||
| | | ||
| nodeTo.asCfgNode() = slash and | ||
| ( | ||
| // type-preserving call | ||
| nodeFrom = left | ||
| or | ||
| // data injection | ||
| nodeFrom.asCfgNode() = slash.getRight() | ||
| ) | ||
| ) |
There was a problem hiding this comment.
Can we extend this to also handle if the path is the right operator?
A demonstration:
In [1]: from pathlib import Path
In [2]: "bar" / Path("foo")
Out[2]: PosixPath('bar/foo')There was a problem hiding this comment.
Aha, thanks for catching this!
Will also support both operands being paths
tausbn
left a comment
There was a problem hiding this comment.
A few suggestions for improvements. Apart from that I think this looks good. 👍
| // Type-preserving call | ||
| exists(DataFlow::AttrRead returnsPath | | ||
| returnsPath.getAttributeName() = pathlibPathMethod() | ||
| | | ||
| nodeTo.(DataFlow::CallCfgNode).getFunction() = returnsPath and | ||
| nodeFrom = returnsPath.getObject() | ||
| ) | ||
| or | ||
| // Type-preserving attribute | ||
| exists(DataFlow::AttrRead isPath | isPath.getAttributeName() = pathlibPathAttribute() | | ||
| nodeTo = isPath and | ||
| nodeFrom = isPath.getObject() | ||
| ) |
There was a problem hiding this comment.
It feels like there's a lot of overlap between this and the type tracker. Could we perhaps factor out these steps appropriately, and avoid repeating ourselves?
There was a problem hiding this comment.
I tried this, see if you like...it could potentially also be done for the injection part.
| ) | ||
| ) | ||
| or | ||
| // Data injection |
There was a problem hiding this comment.
I must admit, it's not immediately obvious to me what "Data injection" means here. 🤔
There was a problem hiding this comment.
It means that we should (taint-)track data from a non-Path-object (in)to a Path-object.
tausbn
left a comment
There was a problem hiding this comment.
I think the refactoring helped a lot, but I still think it could be improved further. I have added a few suggestions.
| /** | ||
| * Flow for type presering mehtods. | ||
| */ | ||
| private predicate typePreservingCall(DataFlow::Node nodeFrom, DataFlow::Node nodeTo) { |
There was a problem hiding this comment.
This is a very generic-sounding name (and QLDoc comment) for something that's specific to pathlib. Perhaps something with pathlib in the name would be better?
(Also, two typos: preserving methods)
There was a problem hiding this comment.
Yeah, I think I had the generalisation of this in my head...
There was a problem hiding this comment.
Alas, the typo survived. (And the name, given that it lives immediately inside the Stdlib module could really use some disambiguation.)
| /** | ||
| * Flow for type presering attributes. | ||
| */ | ||
| private predicate typePreservingAttribute(DataFlow::Node nodeFrom, DataFlow::Node nodeTo) { |
There was a problem hiding this comment.
Same as my other comment (and also another typo).
| slash.getOp() instanceof Div and | ||
| ( | ||
| pathOperand.asCfgNode() = slash.getLeft() and | ||
| dataOperand.asCfgNode() = slash.getRight() | ||
| or | ||
| pathOperand.asCfgNode() = slash.getRight() and | ||
| dataOperand.asCfgNode() = slash.getLeft() | ||
| ) and | ||
| pathOperand.getALocalSource() = pathlibPath() | ||
| | | ||
| nodeTo.asCfgNode() = slash and | ||
| nodeFrom in [ | ||
| // type-preserving call | ||
| pathOperand, | ||
| // data injection | ||
| dataOperand | ||
| ] |
There was a problem hiding this comment.
Does this not just boil down to the following?
slash.getOp() instanceof Div and
nodeFrom.asCfgNode() = slash.getAnOperand() and
nodeFrom.getALocalSource() = pathlibPath() and
nodeTo.asCfgNode() = slash(In which case, more sharing with the pathlibPath typetracker might be possible.)
There was a problem hiding this comment.
Almost, except the argument of type path lib.Path does not have to be nodeFrom. It still does become much shorter this way, though 👍
There was a problem hiding this comment.
Sharing is a bit tricky for data injection, because the taint step would prefer to refer to the type tracker to limit the set of nodes.
Co-authored-by: Taus <tausbn@github.com>
tausbn
left a comment
There was a problem hiding this comment.
Went through the PR again. There are still some points I would like to see addressed, but I don't want this to turn into an eternal PR, so feel free to ignore my suggestions if you disagree with them. We can always clean this up later.
I believe the code is functionally correct.
With that in mind, I've marked this as approved.
| /** | ||
| * Flow for type presering mehtods. | ||
| */ | ||
| private predicate typePreservingCall(DataFlow::Node nodeFrom, DataFlow::Node nodeTo) { |
There was a problem hiding this comment.
Alas, the typo survived. (And the name, given that it lives immediately inside the Stdlib module could really use some disambiguation.)
Co-authored-by: Taus <tausbn@github.com>
tausbn
left a comment
There was a problem hiding this comment.
Alright, let's see if it floats. 🚢
Adds support for
pathlib.Paths by tracking these objects with a type tracker and adding appropriate taint steps.This enables a new extension of
FileSystemAccessbase onpathlib.Pathobjects.I included most of the API since the QLDoc for
FileSystemAccessis quite broad, but I am open to dropping calls such asresolveoris_fifowhich may be harder to exploit.Currently missing:
cwdandhome